Online Experiments: Practical Lessons

نویسندگان

  • Ron Kohavi
  • Roger Longbotham
  • Toby Walker
چکیده

F rom ancient times through the 19th century, physicians used bloodletting to treat acne, ca ncer, diabetes, jaundice, plague, and hundreds of other diseases and ailments (D. Wooton, Doctors Doing Harm since Hippocrates, Oxford Univ. Press, 2006). It was judged most effective to bleed patients while they were sitting upright or standing erect, and blood was often removed until the patient fainted. On 12 December 1799, 67-year-old President George Washington rode his horse in heavy snowfall to inspect his plantation at Mount Vernon. A day later, he was in respiratory distress and his doctors extracted nearly half of his blood over 10 hours, causing anemia and hypotension; he died that night. Today, we know that bloodletting is unhelpful because in 1828 a Pari-sian doctor named Pierre Louis did a controlled experiment. He treated 78 people suffering from pneumonia with early and frequent bloodlet-ting or less aggressive measures and found that bloodletting didn't help survival rates or recovery times. Having roots in agriculture and medicine, controlled experiments have spread into the online world of websites and services. In an earlier Web Technologies article (R. Platform team introduced basic practices of good online experimentation. Three years later and having run hundreds of experiments on more than 20 websites, including some of the world's largest, like msn.com and bing.com, we have learned some important practical lessons about the limitations of standard statistical formulas and about data traps. These lessons, even for seemingly simple univariate experiments, aren't taught in Statistics 101. After reading this article, we hope you'll have better negative introspection: to know what you don't know. In an online controlled experiment , users are randomly assigned to two or more groups for some period of time and exposed to different variants of the website. The most common online experiment, the A/B test, has two variants: the A version of the site is the control and the B version is the treatment. The experimenters define an overall evaluation criterion (OEC) and compute a statistic—for example , the mean of the OEC—for each variant. The OEC statistic is also referred to as a key performance indicator (KPI); in statistics, the OEC is often called the response or dependent variable. The difference between the OEC statistic for the treatment and control groups is the treatment effect. If the experiment was designed and executed properly, the only thing consistently different between the two variants is the …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Field Experiments: Lessons from CommunityLab

We report briefly on a set of online field experiments conducted as part of the CommunityLab collaborative research project. Based on these projects, and the published research literature, we present an analysis of the design choices for online field experiments and report on lessons learned.

متن کامل

Evaluation and comparison the results comprehensive Exam and the mean scores of Basic sciences courses of Isfahan medical students before and after the changes of basic science courses

Introduction: The aim of this study is the evaluation of medical students’ academic achievement after the changes in arrangement and courses of some Basic sciences lessons in school of medicine and comparing their academic achievement before and after the changes in arrangement and the courses. Methods: In this descriptive analytical study 156 samples were selected from 2004 (group 1) and 2005...

متن کامل

Learning Analytics and Educational Games: Lessons Learned from Practical Experience

Learning Analytics (LA) is an emerging discipline focused on obtaining information by analyzing students’ interactions with on-line educational contents. Data is usually collected from online activities such as forums or virtualized courses hosted on Learning Management Systems (e.g. Moodle). Educational games are emerging as a popular type of e-learning content and their high interactivity mak...

متن کامل

TweetGenie: Development, Evaluation, and Lessons Learned

TweetGenie is an online demo that infers the gender and age of Twitter users based on their tweets. TweetGenie was able to attract thousands of visitors. We collected data by asking feedback from visitors and launching an online game. In this paper, we describe the development of TweetGenie and evaluate the demo based on the received feedback and manual annotation. We also reflect on practical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Computer

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2010